Indonesian Journal of Electrical Engineering and Computer Science 
Vol. 14, No. 3, June 2019, pp. 1228~1234 
ISSN: 2502-4752, DOI: 10.1159 1/ijeecs.v14.i13.pp1228-1234 OO 1228 


Design of parallel and pipelined DA based OBC FIR filter for 


software defined radio 


M. Rajmohan, Himanshu Shekhar 


Hindustan Institute of Technology & Science, Chennai, India 








Article Info 


ABSTRACT 





Article history: 


Received Nov 30, 2018 
Revised Jan 21, 2019 
Accepted Feb 28, 2019 


Software Defined Radio (SDR) is a new technology used to implement 
different wireless communication standard for mobile communication. 
The Intermediate Frequency (IF) block is the most demanding block in 
software defined radio. The most important task in intermediate processing 
block is digital filtering which is carried out by Finite Impulse Response 


(FIR) filter. One of the major techniques for the calculation of inner product 





is Distributed Arithmetic (DA) based FIR filter which uses Look Up Table 
Keywords: (LUT) to eliminate the need of multiplier. The efficiency of the DA filter is 
Distabutedianthmede affected with the increasing number of address line and also due to its serial 
operation. To overcome this problem parallel and pipeline based DA filter 


Fir filter using Offset Binary Coding (OBC) for Two Bit At A Time (2-BAAT) is 
Offset binary coding proposed. Our proposed method achieves less area, low power consumption 
SDR and nominal delay for SDR application. 
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1. INTRODUCTION 

In today’s scenario there is a need for wireless communication device which will be able to 
reconfigure, adapt, multi standard wireless communication, air interfaces and waveform. SDR is the fast 
emerging technology in which most of the hardware components are replaced with software techniques for 
digital radio signal processing. It is mainly used to provide addition of new features or capabilities to an 
infrastructure currently in use by reconfiguring the single hardware platform. Figure | shows the architecture 
of SDR base band processing. The channelization in SDR will perform the function of extraction of multiple 
narrowband signals from a wide range of signals using filters [1], [2]. 
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Figure 1. SDR architecture for based band signal processing 
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This paper is organized as follows. The detailed literature survey of the existing FIR filter design 
available for the application of SDR is discussed in Section 2. The next Section describes the background of 
DA. Then the proposed method is discussed in Section 4 followed by the performance analysis of the system 
in Section 5. Finally the conclusion is presented. 


1.1. Background 

An approach for sharp-cutoff in FIR filters using Frequency Response Masking (FRM) is described 
in [3]. This will enhance the Sample Rate Convertor (SRC) and Digital down Converters (DDC). Filters with 
programmable coefficients which offer low power consumption is discussed in [4]. For low complexity, 
the computation sharing multiplier (CSHM) is used. This technique is to characterize the products in FIR 
filter as a grouping of shift and add operations. This structure is more efficient to design an adaptive filter and 
matrix multiplication. 

Pseudo Floating Point (PFP) method is discussed in [5] to reduce the number of adders. The filter 
implemented by fixed point arithmetic representation is employed in digital wideband receivers which 
require coefficient length of 24 bits to meet the desired channel specification. The filter's coefficient word 
length is proportional to the number of adders used in the multiplier. The bits required to encode 24 and 16 
bits can be reduced significantly by the span reduction technique. This method provides 40% reduction in the 
adders and 80% reduction in full adders for the implementation of multipliers when compared with 
existing method. 

A method to implement filter coefficient decimation method for designing low complex and 
reconfigurable FIR filter is described [6]. The fundamental concept is to maintain each N" coefficient FIR 
filter unmoved and change all another coefficients by zero. The benefit of this method over the Frequency 
Response Masking is the transition-band width and passband width of the filter. 

An efficient filter is designed in [7] by combining the FRM and binary CSE. The FRM method is 
utilized simultaneously with the execution of multiplierless techniques the filter complexity is reduced. 
The multiple levels of reconfigurability are done by this approach. It can be reconfigured at the filter level as 
well as architectural level. 

A binary representation for the filter coefficients using CSE technique is discussed in [8] to reduce 
the number of adders in multipliers. It reduces the Logical Operators (LOs) for short filters. For higher order 
filters, it offers enhanced reduction of LOs without any changes in the logic depth. Hence, this scheme is 
most excellent appropriate for the performance of filters in increasing order. 

The programmable shift method and constant shift method is used in [9] to design low complex FIR 
filter. The complication of the coefficient multiplier is decreased by using canonical signed digit. In Common 
Subexpression Elimination (CSE), multiple occurrences of identical bits are identified and eliminated. 
This method is further modified to reduce the complexity using binary CSE. In wireless communication the 
higher order FIR filter are required to assemble the channel attenuation specification. The new architectures 
have less complexity outcomes when compared with existing techniques. 

A pipelined architecture is implemented in [10] to reduce area, delay and power in modified 
DA.The DA is well suited for the implementation of filter in FPGA hardware. The LUT in DA is utilized to 
store pre-computed values and to read out the values easily. The performance analysis of various types of 
order of the filter and various lengths of the address to identify the option of 4 partial tables input. 

A bit serial DA based architecture with OBCtechnique is designed in [11] for fixed coefficients in k- 
tap filter architecture. The two smarts method was discussed for updating the LUT for adaptive filter. 
This method offers reduction in hardware elements and computational efficiency. 

A reconfigurable FIR filter for SDR receiver is described in [12]. For non-uniform channelization, 
the multi-standard SDR needs reconfigurable filter bank. The coefficient decimation and FRM filter bank is 
suggested for excellent transaction between low complexity and reconfigurability. This method is used for 
reconfigurable and enhances the filter frequency response. 

A multiplier less baseband filter is designed in [13] to support different wireless communication 
standard for SDR. The function of channelizer in SDR is to convert wideband signals into narrow band 
signals. To achieve this variable bandwidth filter (VBF) are designed to vary the bandwidth. The multiplier 
less filters is designed with frequency response masking technique with constant coefficient. The canonic 
signed digit representation is minimized by using modified harmony search algorithm. This filter offers less 
power consumption with the reduced complexity. 

A vertical-horizontal binary CSE method for constant multiplier is discussed in [14]. The techniques 
used for design of MCM is done by graph based theory and CSE. The applications like Software Defined 
Radio (SDR) requires dynamically programmable FIR filter in which the coefficients of the filter will change 
with respect to various standards. The proposed method is able to decreasing the multiplier block of adders 
switching activity as compare to that of 2 and 3-bit binary CSE algorithm. 
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A reconfigurable interruption filter is designed in [15] for digital up converter (DUC) with reduction 
of area and power. The additions and multiplications per input model is reduced 83% by this approach when 
compared to individual implementation of filter for each standard. The constant multiplier has been designed 
using two-bit BCS based elimination algorithm. The area is reduced by 41% and the power is reduced by 
38% correspondingly when compared with 3-bit binary common subexpression (BCS). 

A variable digital filter (VDF) is designed in [16] to extract individual channels of radio equivalent 
to multiple wireless communication standards. The VDF design based depends on enhanced coefficient 
method. The pipelined technique is then applied to VDF and the analysis was done. The results show that the 
implementations of pipelined architecture achieves 27.66% reduction in number of Slices, 49.17% reduction 
in number of LUTs, and 25.59% reduction in dynamic power consumption when compared with 
corresponding nonpipelined implementations. The execution of pipelined architecture provides high 
operating frequencies. 

A technique for reducing the area by replacing the long word-length Structural Adders (SAs) with 
pre-SAs of shorter word length is discussed in [17]. The adders in the product accumulation structure are 
referred as SAs. The filter coefficients are assembled to get benefit of the symmetry of the FIR filters. 

A reconfigurable interpolation Root-Raised Cosine (RRC) filter based on binary CSE technique is 
discussed in [18], [19]. The FIR filter coefficients can be altered during the runtime for the multi-standard 
wireless communication system. In the 4-bit binary CSE technique, the number of BCSs coefficients has 
been reduced. By using programmable adders, to reduce the operation of addition and an efficient 
architecture of polyphase interpolation is implemented. 


1.2. Problem Statement 
The major issue in SDR is the adaptation of the channel filter for different communication standards 
which depends mainly on the reconfigurability of the FIR filter. 
a. The performance of the channelizer can be determined by the efficient design of the FIR filter with low 
power consumption, high speed, and minimum area. 
The hardware complexity of a filter depends on multiplier, adder and delay. 
c. The disadvantage of DA is bit serial in nature, which can affect the performance of the filter. 


2. THE PROPOSED SOLUTION 

Our proposed work is to design a FIR filter for the purpose of channelizer which palys a crucial role 
in SDR. DA is more effective for the realization of real time filter in application specific integrated circuit 
and Field Programmable Gate Arrays (FPGAs). 

To increase the efficiency of DA based OBC using bit pairing technique is used. Finally the 
computation speed is increased by using parallel and pipeline techniques. 


2.1. Distributed Arithmetic (DA) and OBC 

Distributed Arithmetic is used to compute multiply accumulate operation without using multipliers. 
It converts Multiply Accumulate Operation (MAC) into shift and accumulation. To reduce the size of the 
hardware in FPGA design, a multiplier less technique is employed. The expression for MAC is given as 


K 
Y= AXE 
k=l (1) 


where Xj and Ag denotes the inputs and fixed coefficients respectively. 


If each Xx isa 2's-complement binary number scaled 


If mod X,% <1, then each X k is represented as 
bxg is the sign bit 
N-1 


Xk as: eee es S ben?” (2) 


n=l 
Substitute 3.2 in 3.1 
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K N-1 
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k=l n=l 
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y= Sa deo + | = Dik Dd, ben" + Y Ae(-bx0) 
k=l n=l k=l n=l k=l 


N-1| K K 
= » Sarat alba beet 


n=l...Lk=1 k=l 


Where K is the number of inputs or taps 
N —1 number of word length of data 
K 
»y Axbkn | has only rs possible values 
k=] 


K 
»; Ax (— bgp) has only 2k possible values 
k=1 
The size of the read only memory can be determined by 2 * 2* | The sums of coefficients are stored 
in LUT and the input sample will be used as address to access the content of LUT [11]. For the 4-tap 


conventional DA filter the read only memory (ROM) size required is 2* 2* =32 word. By applying OBC 
technique, the size of the ROM can be reduced. 


2.2. Pipeline and Parallel Implementation of DA using OBC 

The OBC based DA architecture reduces half of the logic elements over the conventional DA 
technique. Hence, the number of LUT used in OBC based DA technique is reduced by half, i.e for a 4-tap 
filter the size of the ROM is reduced from 32 to 16. The expression for reducing the ROM size is given as 


1 
xk => [xe -(-2x)] 
(4) 
Where Xj is given as 
N-1 
XR = —beg + Doe" 
a (5) 


The two’s complement form of X x is given as 


N-1 
— xz =-bk0 + Si bkn2” gg ND) (6) 


n=1 
Substituting 4.2 & 4.3 in 4.1 


N-1 
ve =| —be0 -5k0)+ Y lon ben)" -2- WY) (7) 


n=l 





Rewriting (4.4) 
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By defining Cxyas bey —Dkn, the output of the FIR filter can be written as: 


k 


N-1 
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(8) 


(9) 


The hardware circuit for the implementation of k-tap filter is illustrated in [11]. This technique is 
well suited for the application in which the filter coefficients are fixed. The size of the ROM increases 
exponentially while increasing the number of taps. The application of OBC based technique results in the 
reduction of LUT is shown in Table 1, but still there is a need for reduction in size of LUT and increase in 
computation efficiency for the real time implementation of filter. To improve the speed of computation, 
two bits are processed serially at a time instead of one BAAT. The total clock cycles required for N bit input 
word will be N, if one input is processed at a time. For 2 BAAT, the speed will be increased by a factor of M, 


if the N bit word is divided into M subwords. 


Tabel. K=4 Tap FIR Filterlut Content Using OBC 





xOn xin x2n x3n 


LUT contents 
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Perr OOO 
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BPRrROOrFRFHFO 


BPRrROORHFO 


0 


FPOrFROrFOFH 


FPOrROrFROF 


-1/2 (wO + wl + w2 + w3) 
-1/2 (w0 + wl + w2 — w3) 
-1/2 (w0 + wl — w2 + w3) 
-1/2 (w0 + wl — w2 — w3) 
-1/2 (wO — wl + w2 + w3) 
-1/2 (w0 — w 14+ w2 - w3) 
-1/2 (w0 — wl — w2 + w3) 
-1/2 (w0 — wl — w2 — w3) 
-1/2 (w0O — wl — w2 — w3) 
-1/2 (w0 — wl — w2 + w3) 
-1/2 (wO — w 1+ w2 - w3) 
-1/2 (wO — wl + w2 + w3) 
-1/2 (w0 + wl — w2 — w3) 
-1/2 (w0 + wl — w2 + w3) 
-1/2 (w0O + wl + w2 — w3) 
-1/2 (wO + wl + w2 + w3) 





The proposed parallel and pipelined based DA shown in Figure 2. The proposed technique consists 
of two 8-word ROM, two multiplexer, shift and accumulate unit, pipeline register and adder. The input xon 
will act as a control bit for the sign bit and to reduce the ROM size as half. The output of the ROM is given 
as input to multiplexer and the switch s1 is 0 for when n=0 and | for otherwise. The switch s3 in the circuit is 
1 when n=N-1 and 0 otherwise. The output of the multiplexer will be shifted and added as per the (4.6). 
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Figure 2. DA based architecture for 2- BAAT using OBC 


3. RESULT AND DISCUSSION 

The proposed parallel and pipelined based DA using OBC is designed using Verilog language. 
Modelsim XE 6.3C is used for simulation and synthesized using Xilinx ISE 12.41 (Family-Spartan 3, 
Package-Xc3s50, Speed grade is -5) design tool. Table 2 shows thecomparison of proposed parallel and 
pipelined based da using obc and existing method. 


Table 2. Comparison of Proposed Parallel and Pipelined based DA using OBC and Existing Method 








Type Slices Flip flops Delay (ns) | Power (mW) 
Existing method for K=4 and 8 bit [20] 35 44 5.089 386 
Existing method for K=4 and 16 bit [20] 55 80 3.727 1606 
Proposed Method for K=4 and 8 bit based DA 23 29 4.506 213 
Proposed method for K=4 and 16 bit based DA 51 58 4.579 234 





It is clearly observed from Table 2 that the slices are reduced by 34.29% and the flip flops by 
34.09%. The power consumption is reduced by 44.04% in comparison with existing method. For k=4 and 
16bit the flip flop and the power consumption are reduced by 27.5% and 85.42% respectively when 
compared to the existing method. The delay for the 8 bit is less but for 16 bit it is slightly found to be higher 
than the existing method. The proposed method achieves low area and power consumption with nominal 
delay which will be suitable for portable communication devices. 


4. CONCLUSION 

The parallel and pipelined based DA filter using OBC with bit paired technique is proposed for SDR 
application. The OBC technique in DA will reduce the size of the LUT by half. The bit pairing technique is 
used to increase the processing capability, two bit at a time due to which the processing capability and 
computation efficiency is increased when compared with the existing method. Our proposed method with 
parallel processing of ROM and pipeline technique offers 80% power reduction and more than 25% 
reduction in flip flops with the increasing number of bits from 8 to 16. In future, LUT can be automatically 
updated for time varying filter coefficient in SDR applications. 
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